User guide
This guide provides an overview of a simple geostatistical workflow, which consists of of three basic steps:
- Manipulation of spatial data
- Definition of geostatistical problem
- Visualization of problem solution
These steps are only shown to guide the reader, which is free to follow his/her own workflow.
Manipulating data
The workflow starts with the creation of spatial data objects, which can be loaded from disk or derived from other Julia variables. For example, given a Julia array (or image), which is not attached to any particular coordinate system:
using Plots
Z = [10sin(i/10) + j for i in 1:100, j in 1:200]
heatmap(Z, c=:bluesreds)
we can georeference the image as RegularGridData:
using GeoStats
Ω = RegularGridData{Float64}(Dict(:Z=>Z))100×200 RegularGridData{Float64,2}
origin: (0.0, 0.0)
spacing: (1.0, 1.0)
variables
└─Z (Float64)The origin and spacing of samples in the regular grid can be specified in the constructor:
RegularGridData(Dict(:Z=>Z), (1.,1.), (10.,10.))100×200 RegularGridData{Float64,2}
origin: (1.0, 1.0)
spacing: (10.0, 10.0)
variables
└─Z (Float64)and different spatial data types have different constructor options (see Data for more options).
We plot the spatial data and note a few differences compared to the image plot shown above:
plot(Ω)
First, we note that the image was rotated to match the first index i of the array with the horizontal "x" axis, and the second index j of the array with the vertical "y" axis. Second, we note that the image was stretched to reflect the real 100x200 size of the regular grid data.
Each sample in the spatial data object has a coordinate, which is calculated on demand for a given list of locations (i.e. spatial indices):
coordinates(Ω, 1:3)2×3 Array{Float64,2}:
0.0 1.0 2.0
0.0 0.0 0.0In-place versions exist to avoid unnecessary memory allocations.
All coordinates are retrieved as a ndims x npoints matrix when we do not specify the spatial indices:
coordinates(Ω)2×20000 Array{Float64,2}:
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 … 95.0 96.0 97.0 98.0 99.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 199.0 199.0 199.0 199.0 199.0Tabular access
Spatial data types implement the Tables.jl interface, which means that they can be accessed as if they were tables with samples in the rows and variables in the columns:
Ω[1:3,:Z]3-element Array{Float64,1}:
1.9983341664682817
2.9866933079506124
3.9552020666133956In this case, the coordinates of the samples are lost. To reconstruct a spatial data object, we need to save the spatial indices that were used to index the table:
zvals = Ω[1:3,:Z]
coord = coordinates(Ω, 1:3)
PointSetData(Dict(:Z=>zvals), coord)3 PointSetData{Float64,2}
variables
└─Z (Float64)To recover the original Julia array behind a spatial data object, we can index the data with the variable name. In this case, the size of the array (i.e. 100x200) is preserved:
Ω[:Z]100×200 Array{Float64,2}:
1.99833 2.99833 3.99833 … 197.998 198.998 199.998 200.998
2.98669 3.98669 4.98669 198.987 199.987 200.987 201.987
3.9552 4.9552 5.9552 199.955 200.955 201.955 202.955
4.89418 5.89418 6.89418 200.894 201.894 202.894 203.894
5.79426 6.79426 7.79426 201.794 202.794 203.794 204.794
6.64642 7.64642 8.64642 … 202.646 203.646 204.646 205.646
7.44218 8.44218 9.44218 203.442 204.442 205.442 206.442
8.17356 9.17356 10.1736 204.174 205.174 206.174 207.174
8.83327 9.83327 10.8333 204.833 205.833 206.833 207.833
9.41471 10.4147 11.4147 205.415 206.415 207.415 208.415
⋮ ⋱
3.2289 4.2289 5.2289 199.229 200.229 201.229 202.229
2.24454 3.24454 4.24454 198.245 199.245 200.245 201.245
1.24775 2.24775 3.24775 197.248 198.248 199.248 200.248
0.248489 1.24849 2.24849 196.248 197.248 198.248 199.248
-0.743268 0.256732 1.25673 … 195.257 196.257 197.257 198.257
-1.71761 -0.717606 0.282394 194.282 195.282 196.282 197.282
-2.66479 -1.66479 -0.664791 193.335 194.335 195.335 196.335
-3.57536 -2.57536 -1.57536 192.425 193.425 194.425 195.425
-4.44021 -3.44021 -2.44021 191.56 192.56 193.56 194.56 Spatial views
Spatial data types can be viewed at a subset of locations without unnecessary memory allocations. Spatial views do not preserve the spatial regularity of the data in general.
By plotting a view of the first 10 lines of our regular grid data, we obtain a general PointSetData as opposed to a RegularGridData:
Ωᵥ = view(Ω, 1:10*100)
plot(Ωᵥ)
We plot a random view of the grid to emphasize that views do not preserve spatial regularity:
inds = rand(1:npoints(Ω), 100)
plot(view(Ω, inds))
Data partitions
Spatial data objects can be partitioned with various efficient methods. To demonstrate the operation, we partition our spatial data view into balls of given radius:
Π = partition(Ωᵥ, BallPartitioner(5.))
plot(Π)
or, alternatively, into two halfspaces:
Π = partition(Ωᵥ, BisectFractionPartitioner((1.,1.), 0.5))
plot(Π)
Spatial partitions are (lazy) iterators of spatial views, which are useful in many contexts as it will be shown in the next section of the user guide. To access a subset of a partition, we use index notation:
plot(Π[1])
plot(Π[2])
Defining problems
Having defined the spatial data objects, we proceed and define the geostatistical problem to be solved. In this guide, we illustrate geostatistical learning. For other types of geostatistical problems, please check the Problems section of the documentation.
Let's assume that we have spatial data with some variable that we want to predict in a supervised learning setting. We load the data from a CSV file, and inspect the available columns:
using CSV
df = CSV.read("data/agriculture.csv")
first(df, 5)Columns band1, ..., band4 represent four satellite bands for different locations (x,y) in this spatial region. The column crop has the crop type for each location that was labeled manually with the purpose of training a learning model.
Because the labels are categorical variables, we need to inform the framework the correct categorical type:
using CategoricalArrays
df.crop = categorical(df.crop)
first(df, 5)We can now georeference the data as a GeoDataFrame and plot some of the spatial variables:
Ω = GeoDataFrame(df, [:x,:y])
gr(format=:png) # hide
plot(Ω, variables=[:band4,:crop], ms=0.5, mc=:viridis, size=(800,400))Similar to a generic statistical learning workflow, we split the data into "train" and "test" sets. The main difference here is that our spatial split function accepts a separating plane specified by its normal direction (1,-1):
Ωs, Ωt = split(Ω, 0.2, (1.,-1.))
plot(domain(Ωs), ms=0.5)
plot!(domain(Ωt), ms=0.5, mc=:green)
We can visualize the domain of the "train" (or source) set Ωs in black, and the domain of the "test" (or target) set Ωt in green. We reserved 20% of the samples to Ωs and 80% to Ωt.
Let's define the learning task and the geostatistical learning problem. We want to predict the crop type based on the four satellite bands. We will train the model in Ωs where labels are available, and apply it to Ωt, which is our target:
features = [:band1,:band2,:band3,:band4]
label = :crop
task = ClassificationTask(features, label)
problem = LearningProblem(Ωs, Ωt, task)GeoStats.jl is integrated with the MLJ.jl project, which means that we can solve geostatistical learning problems with any classical learning model:
using MLJ
@load DecisionTreeClassifier
model = DecisionTreeClassifier()
solver = PointwiseLearn(model)PointwiseLearn
└─model ⇨ [34mDecisionTreeClassifier @ 1…60[39m
In this example, we selected a PointwiseLearn strategy to solve the geostatistical learning problem. This strategy consists of applying the learning model pointwise for every point in the spatial data:
Ω̂t = solve(problem, solver)Plotting solutions
First, we note that the solution to a geostatistical learning problem is a spatial data object. We can inspect it with the same methods already described:
Ω̂t[1:5,:crop]This also means that we can plot the solution directly, side by side with the true label in this synthetic example:
p̂ = plot(Ω̂t, ms=0.5, mc=:viridis, title="crop (prediction)")
p = plot(Ωt, variables=[:crop], ms=0.5, mc=:viridis)
plot(p̂, p, size=(800,400))Visually, the learning model has succeeded predicting the crop. We can also estimate the generalization error of the geostatistical solver with spatial validation methods such as block cross-validation and leave-ball-out, but these methods deserve a separate tutorial.
With this example we conclude the user guide. To get familiar with other features of GeoStats.jl, please check the Tutorials and the Reference guide.